The article introduces the Chonky multilingual transformer model, designed to intelligently segment text into meaningful semantic chunks for use in retrieval-augmented generation (RAG) systems. It provides usage examples in Python, demonstrating how to implement the model for text processing and named entity recognition (NER) tasks. The model is fine-tuned for sequences of length 1024 and shows strong performance across various languages.
multilingual ✓
text segmentation ✓
transformer model ✓